-
Notifications
You must be signed in to change notification settings - Fork 349
Support newer versions of MedCalc-Bench #3921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support newer versions of MedCalc-Bench #3921
Conversation
|
Hi @yifanmai, @MiguelAFH Medcalc_bench v1.0 is returning a 404 error here: https://huggingface.co/datasets/ncbi/MedCalc-Bench-v1.0. Could you please review this PR? Thanks. |
|
The link you sent https://huggingface.co/datasets/ncbi/MedCalc-Bench-v1.0 does not return a 404. Could you clarify what you meant by this? As for upgrading to MedCalc-Bench-v2.0, I am OK with this, but it should be a new separate run spec function in order to maintain reverse compatibility. Users running evals using the existing MedCalc-Bench should not see any changes. |
Hi @yifanmai A few weeks ago, I encountered a 404 error and saw the dataset for version 2.0. Now, there were also additional updates to versions 1.1 and 1.2. As you suggested, I will work on creating the new run specifications for medcalc_bench_v1.1 and medcalc_bench_v1.2. Thanks Regards |
|
Great, thanks for the update. |
|
Hi, I was going to come to raise an issue about suggesting to use the new medcalc dataset, but it seems that someone else has gotten here before me. I have made a few more changes to the MedCalc-Bench dataset from v1.2 and you can find the newest dataset here: https://github.com/nikhilk7153/MedCalc-Bench-Verified. All updates will be made on this new repo. MedCalc-Bench Verified is an updated version from v1.2. You can find the changes from the verified version here in the released version: https://github.com/nikhilk7153/MedCalc-Bench-Verified/releases/tag/MedCalc-Bench-Verified |
|
Hi @yifanmai can you please review this PR? Thanks |
|
|
||
| class MATHScenario(Scenario): | ||
| """ | ||
| r""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert unrelated change.
|
Will there be any future plans to update MedHELM @yifanmai ? Would it be possible to use MedCalc-Bench Verified instead of the v1.0 that is currently being used? We have fixed a number of annotation and ground truth label issues (approx. 1/3) of the dataset and so re-running would be beneficial to provide a more accurate version of the landscape. |
|
I think this is more of a question for @MiguelAFH - the official evals and results are maintained by them, so it depend on whether there is funding and bandwidth available for this. |
…unction with version parameter
|
Hi @yifanmai, I have made the requested updates. Could you please review this pull request? Thanks Regards |
Thanks for the heads up on the benchmark updates. We are currently working on a new release for when the paper comes out in the coming months - we will talk about this internally and let you know if there's enough bandwidth for it. |
yifanmai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thank you!
This pull request updates the MedCalc-Bench scenario to use the latest dataset version and adds a basic test for the scenario. The main changes focus on keeping the dataset reference current and improving test coverage.
MedCalc-Bench scenario updates:
MedCalc-Bench-v2.0instead ofv1.0inmedcalc_bench_scenario.py.MedCalc-Bench-v1.0 changed to MedCalc-Bench-v2.0
Testing improvements:
test_medcalc_bench_scenario.pywith a pytest-based test that verifies the scenario loads instances and that the first instance is from the "test" split.Documentation formatting:
MATHScenarioclass to use a raw string for improved formatting.@yifanmai @MiguelAFH
Could you please review this PR?